Add nds_h validation script #198

yinqingh · 2024-10-10T02:51:27Z

This PR is to add a validation script for nds_h queries to compare outputs between CPU and GPU. The main checking logic is the same with nds_validate.py but remove some special checks for NDS queries.

=== Comparing Query: query4 ===
Collecting rows from DataFrame
Collected 5 rows in 0.16502952575683594 seconds
Collecting rows from DataFrame
Collected 5 rows in 0.173628568649292 seconds
Processed 5 rows
Results match

Signed-off-by: Yinqing Hao <[email protected]>

yinqingh · 2024-10-10T02:55:28Z

Observed inconsistent results for query18. Need further investigations on this.

Row 22:
['Customer#079420369', 79420369, 15970070725, datetime.date(1995, 1, 15), Decimal('569974.15'), Decimal('324.00')]
['Customer#079420369', 79420369, 7380136135, datetime.date(1995, 1, 15), Decimal('569974.15'), Decimal('324.00')]

Processed 100 rows
There were 1 errors
=== Unmatch Queries: ['query18'] ===

wjxiz1992 · 2024-10-10T09:23:26Z

I rerun this query with the same data(we synced offline) and got the same diff, I'm afraid it's a rapids bug. I'll file a bug at rapids side.

Signed-off-by: Yinqing Hao <[email protected]>

yinqingh · 2024-10-17T03:41:36Z

nds-h/nds_h_validate.py

+    Returns:
+        bool: True if result matches otherwise False
+    """
+    if query_name in SKIP_QUERIES:


Skip checking query15_part1 and query15_part3 since these are create/drop view queries and no output for these queries.

yinqingh · 2024-10-17T03:44:35Z

nds-h/nds_h_validate.py

+                    use_iterator: bool):
+    # skip output for specific query columns
+    if query_name in SKIP_COLUMNS:
+        df = df.drop(*SKIP_COLUMNS[query_name])


Drop column o_orderkey of the output of query 18 due to the non-deterministic results

pxLi · 2024-10-17T03:54:52Z

nds-h/nds_h_validate.py

+import os
+import re
+import time
+from decimal import *


nit: too wide import?

Fixed. Thanks!

pxLi · 2024-10-17T04:00:05Z

nds-h/nds_h_validate.py

+from decimal import *
+
+from pyspark.sql import DataFrame, SparkSession
+from pyspark.sql.types import *


nit: same as above

Fixed. Thanks!

pxLi · 2024-10-17T04:15:51Z

nds-h/nds_h_validate.py

+        df = df.drop(*SKIP_COLUMNS[query_name])
+
+    # apply sorting if specified
+    non_float_cols = [col(field.name) for \


non_float_cols = [col(field.name) for field in df.schema.fields if field.dataType not in (FloatType(), DoubleType())] float_cols = [col(field.name) for field in df.schema.fields if field.dataType in (FloatType(), DoubleType())]

Fixed. Thanks!

pxLi · 2024-10-17T04:18:38Z

nds-h/nds_h_validate.py

+            return math.isclose(expected, actual, rel_tol=epsilon)
+    elif isinstance(expected, str) and isinstance(actual, str):
+        return expected == actual
+    elif expected == None and actual == None:


these were covered in below return ``expected == actual`

I updated this function to make it more readable and I think it should work as the same as previous one. Please let me know if some special cases are not covered in this function. Thanks! cc @wjxiz1992

def compare(expected, actual, epsilon=0.00001): #TODO 1: we can optimize this with case-match after Python 3.10 #TODO 2: we can support complex data types like nested type if needed in the future. # now NDS only contains simple data types. if isinstance(expected, float) and isinstance(actual, float): # Double is converted to float in pyspark... if math.isnan(expected) and math.isnan(actual): return True return math.isclose(expected, actual, rel_tol=epsilon) if isinstance(expected, Decimal) and isinstance(actual, Decimal): return math.isclose(expected, actual, rel_tol=epsilon) return expected == actual

pxLi · 2024-10-17T04:21:40Z

nds-h/nds_h_validate.py

+            raise Exception(f"More than one summary file found for query {query_name} in folder {prefix}.")
+        if len(file_glob) == 0:
+            raise Exception(f"No summary file found for query {query_name} in folder {prefix}.")
+        for filename in file_glob:


if len(file_glob) > 1: should have already errored out.

Updated. Thanks!

Signed-off-by: Yinqing Hao <[email protected]>

wjxiz1992

LGTM

Add nds_h validation script

31e6efa

Signed-off-by: Yinqing Hao <[email protected]>

ignore output of column o_orderkey for query18

574de5a

Signed-off-by: Yinqing Hao <[email protected]>

yinqingh force-pushed the yinqing-ndsh-validate branch from a3cb119 to 574de5a Compare October 17, 2024 03:35

yinqingh commented Oct 17, 2024

View reviewed changes

yinqingh changed the title ~~[WIP] Add nds_h validation script~~ Add nds_h validation script Oct 17, 2024

yinqingh requested review from mattahrens, pxLi, wjxiz1992 and GaryShen2008 October 17, 2024 03:45

pxLi reviewed Oct 17, 2024

View reviewed changes

Fix compare function

25a0cdd

Signed-off-by: Yinqing Hao <[email protected]>

yinqingh force-pushed the yinqing-ndsh-validate branch from 56fc6f2 to 25a0cdd Compare October 17, 2024 05:55

mattahrens approved these changes Oct 22, 2024

View reviewed changes

wjxiz1992 approved these changes Oct 23, 2024

View reviewed changes

wjxiz1992 merged commit a263915 into NVIDIA:dev Oct 23, 2024
2 checks passed

yinqingh deleted the yinqing-ndsh-validate branch October 23, 2024 03:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add nds_h validation script #198

Add nds_h validation script #198

yinqingh commented Oct 10, 2024

yinqingh commented Oct 10, 2024

wjxiz1992 commented Oct 10, 2024

yinqingh Oct 17, 2024

yinqingh Oct 17, 2024

pxLi Oct 17, 2024

yinqingh Oct 17, 2024

pxLi Oct 17, 2024

yinqingh Oct 17, 2024

pxLi Oct 17, 2024

yinqingh Oct 17, 2024

pxLi Oct 17, 2024 •

edited

Loading

yinqingh Oct 17, 2024 •

edited

Loading

pxLi Oct 17, 2024

yinqingh Oct 17, 2024

wjxiz1992 left a comment

Add nds_h validation script #198

Add nds_h validation script #198

Conversation

yinqingh commented Oct 10, 2024

yinqingh commented Oct 10, 2024

wjxiz1992 commented Oct 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pxLi Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

yinqingh Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wjxiz1992 left a comment

Choose a reason for hiding this comment

pxLi Oct 17, 2024 •

edited

Loading

yinqingh Oct 17, 2024 •

edited

Loading